TREC-9 Cross-Language Information Retrieval (English-Chinese) Overview

نویسندگان

  • Fredric C. Gey
  • Aitao Chen
چکیده

(English Chinese) Overview Fredri Gey and Aitao Chen UC DATA and SIMS University of California, Berkeley e-mail: gey u data.berkeley.edu,aitao sims.berkeley.edu Abstra t Sixteen groups parti ipated in the TREC-9 ross-language information retrieval tra k whi h fo ussed on retrieving Chinese language do uments in response to 25 English queries. A variety of CLIR approa hes were tested and a ri h set of experiments performed whi h measured the utility of various resour es su h as ma hine translation and parallel orpora, as well as preand posttranslation query expansion using pseudo-relevan e feedba k. 1 Introdu tion For TREC-9 the ross-language information retrieval task was to utilize English queries against Chinese do uments. This aspe t of multilingual information a ess at TREC-9 was the seventh year in whi h non-English do ument retrieval was tested and evaluated, and the fourth year for whi h ross-language information retrieval has been experimented with. In TREC-3, retrieval of 25 queries against a Mexi an newspaper orpus was tested by four groups. Spanish language retrieval was evaluated in TREC-3, TREC-4 (another 25 queries for the same Mexi an orpus), and TREC-5 (where an European Spanish orpus was used). In TREC-5 a Chinese language tra k was introdu ed using both newspaper (People's Daily) and newswire (XinHua) sour es from People's Republi of China and 25 Chinese queries with an English translation supplied. The TREC-5 orpus was represented with the GB hara ter set for the simpli ed Chinese language of PRC. Chinese monolingual experiments on this olle tion were done in TREC-5 and TREC-6 and sparked serious resear h into Chinese text segmentation methods using di tionary methods as well as statisti al methods using measures su h as mutual information. Comparisons have been made with simple overlapping bigram segmentation methods for monolingual Chinese retrieval. TREC onferen es TREC-6, TREC-7 and TREC-8 has ross language tra ks whi h fo ussed upon European languages (English, Fren h, German, and later Italian). Following TREC-8 the venue for evaluating European language retrieval moved to Europe with the Cross-Language Evaluation Forum (CLEF) rst held in Lisbon in September 2000 [9℄.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

English-Chinese Cross-Language IR Using Bilingual Dictionaries

This report describes the English-Chinese crosslanguage experiments at Berkeley for TREC-9 CrossLanguage Information Retrieval track. We present a simple and effective Chinese word segmentation method and compare the cross-language retrieval performance of two bilingual dictionaries for query translation.

متن کامل

CINDOR TREC-9 English-Chinese Evaluation

MNIS-TextWise Labs participated in the TREC-9 Chinese Cross-Language Information Retrieval track. The focus of our research for this participation has been on rapidly adding Chinese capabilities to CINDOR using tools for automatically generating a Chinese Conceptual Interlingua from existing lexical resources. For the TREC-9 evaluation we also built a version of our system which loosely integra...

متن کامل

TREC-9 CLIR Experiments at MSRCN

In TREC-9, we participated in the English-Chinese Cross-Language Information Retrieval (CLIR) track. Our work involved two aspects: finding good methods for Chinese IR, and finding effective translation means between English and Chinese. On Chinese monolingual retrieval, we investigated the use of different entities as indexes, pseudorelevance feedback, and length normalization, and examined th...

متن کامل

English-Chinese Cross-Language Retrieval based on a Translation Package

An inexpensive COTS translation package, augmented with a downloadable bilingual dictionary, was employed for a study of English-Chinese cross-language information retrieval (CLIR) using the query translation approach. The experimental setting involved the 170 MB Chinese collections and 54 queries of TREC and their relevance judgment, and our PIRCS bi-lingual retrieval system. With some standar...

متن کامل

TREC-9 Cross Language, Web and Question-Answering Track Experiments using PIRCS

In TREC-9, we participated in the English-Chinese Cross Language, 10GB Web data ad-hoc retrieval as well as the Question-Answering tracks, all using automatic procedures. All these tracks were new for us. For Cross Language track, we made use of two techniques of query translation: MT software and bilingual wordlist lookup with disambiguation. The retrieval lists from them were then combined as...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000